classification result
Attacks Meet Interpretability: Attribute-steered Detection of Adversarial Samples
Adversarial sample attacks perturb benign inputs to induce DNN misbehaviors. Recent research has demonstrated the widespread presence and the devastating consequences of such attacks. Existing defense techniques either assume prior knowledge of specific attacks or may not work well on complex models due to their underlying assumptions. We argue that adversarial sample attacks are deeply entangled with interpretability of DNN models: while classification results on benign inputs can be reasoned based on the human perceptible features/attributes, results on adversarial samples can hardly be explained. Therefore, we propose a novel adversarial sample detection technique for face recognition models, based on interpretability. It features a novel bi-directional correspondence inference between attributes and internal neurons to identify neurons critical for individual attributes.
Removing Bias in Multi-modal Classifiers: Regularization by Maximizing Functional Entropies
Many recent datasets contain a variety of different data modalities, for instance, image, question, and answer data in visual question answering (VQA). When training deep net classifiers on those multi-modal datasets, the modalities get exploited at different scales, i.e., some modalities can more easily contribute to the classification results than others. This is suboptimal because the classifier is inherently biased towards a subset of the modalities. To alleviate this shortcoming, we propose a novel regularization term based on the functional entropy. Intuitively, this term encourages to balance the contribution of each modality to the classification result. However, regularization with the functional entropy is challenging.
Attacks Meet Interpretability: Attribute-steered Detection of Adversarial Samples
Adversarial sample attacks perturb benign inputs to induce DNN misbehaviors. Recent research has demonstrated the widespread presence and the devastating consequences of such attacks. Existing defense techniques either assume prior knowledge of specific attacks or may not work well on complex models due to their underlying assumptions. We argue that adversarial sample attacks are deeply entangled with interpretability of DNN models: while classification results on benign inputs can be reasoned based on the human perceptible features/attributes, results on adversarial samples can hardly be explained. Therefore, we propose a novel adversarial sample detection technique for face recognition models, based on interpretability. It features a novel bi-directional correspondence inference between attributes and internal neurons to identify neurons critical for individual attributes.
- North America > Canada > Quebec > Montreal (0.05)
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > United States > Massachusetts > Plymouth County > Norwell (0.04)
- (2 more...)
- Europe > France > Grand Est > Bas-Rhin > Strasbourg (0.04)
- Asia > China > Shandong Province > Jinan (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- (2 more...)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Health & Medicine > Diagnostic Medicine (0.94)
- Information Technology (0.67)
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.68)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Sensing and Signal Processing > Image Processing (0.68)
- North America > Canada > Quebec > Montreal (0.05)
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > United States > Massachusetts > Plymouth County > Norwell (0.04)
- (2 more...)
DropEdge (%)
Reviewer #1: Thank you for the positive comments and suggestions! Below we address your questions in detail. It would be better if authors can try dropedge and sampling methods, instead of only adopting dropnode. Table 6 shows the classification results on benchmarks. It would be better if authors can provide the performance under different training ratio.
- Asia > China > Sichuan Province > Chengdu (0.05)
- Asia > Middle East > Israel > Haifa District > Haifa (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)